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By performing a stochastic dynamic in a space of Slater determinants, the Full Configuration 
Interaction Quantum Monte Carlo (FCIQMC) method has been able to obtain energies which are 
essentially free from systematic error to the basis set correlation energy, within small and sys- 
tematically improvable errorbars. However, the weakly exponential scaling with basis size makes 
converging the energy with respect to basis set costly and in larger systems, impossible. To amelio- 
rate these basis set issues, here we use perturbation theory to couple the FCIQMC wave function 
to an explicitly correlated strongly orthogonal basis of geminals, following the [2]ri2 approach of 
Valeev et ai. The required one- and two-particle density matrices are computed on-the-fly during 
the FCIQMC dynamic, using a sampling procedure which incurs relatively little additional compu- 
tation expense. The F12 energy corrections are shown to converge rapidly as a function of sampling, 
both in imaginary time, and number of walkers. Our pilot calculations on the binding curve for 
carbon dimer, which exhibits strong correlation effects as well as substantial basis set dependence, 
demonstrate that the accuracy of the FCIQMC-F12 method surpasses that of all previous FCIQMC 
calculations, and that the F12 correction improves accuracy equivalent to increasing the quality of 
the one-electron basis by two cardinal numbers. 



I. INTRODUCTION 

The Full Configuration Interaction Quantum Monte 
Carlo (FCIQMC) method has arisen over the last few 
years as a way to obtain near exact ground state energies 
(in the Full Configuration Interaction sense) for Hamil- 
tonians in a given Hilbert space. A range of Hamiltoni- 
ans have been tackled from molecular [IH5] to solid-state 
systems [6HS], obtaining accurate correlation energies in 
Hilbert spaces far beyond the reach of traditional iter- 
ative diagonalizers. This is achieved by stochastically 
sampling the underlying Slater determinant space us- 
ing a discrete 'walker' representation, thereby reducing 
the computational effort spent on the realisation of the 
large number of low-weighted amplitudes in the expan- 
sion. The price of this stochastic algorithm is the in- 
troduction of random errors, which nonetheless can be 
controlled and reduced as the inverse square-root of the 
computational effort. 

However, it has been known since the early days of 
electronic structure theory, that the necessity for these 
huge many-body expansions is to a large extent an ar- 
tifact of the basis functions that are used to represent 
the wavefunction. The Full Configuration Interaction 
(FCI) expansion constructs the wavefunction as a linear 
combination of all possible antisymmetrized spin-orbital 
products, called Slater determinants [SHU] ■ Although the 
evaluation of matrix elements, and the resulting linear 
optimisation problem is relatively straightforward and 
computationally efficient, the size of the determinantal 
basis scales binomially with the number of spin-orbitals, 
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which is the source of the current restriction on FCI- 
based methodology. Moreover, since the spin-orbitals are 
intrinsically one-electron functions, the Slater determi- 
nant basis is ill-suited to some aspects of many-electron 
correlated motion, specifically those caused by the sin- 
gular electron-electron interactions at short range. This 
results in a slow convergence of the electronic energy with 
the size of the underlying basis, hence exacerbating the 
binomial bottleneck. 

Due to this high scaling with basis size, enlarge- 
ment of the basis set and subsequent extrapolation is 
in many cases impossible, particularly for the multi- 
configurational methods required for strongly correlated 
systems. As a consequence, for high-level methods, the 
basis set error is often far larger than the error in the 
correlation treatment within the basis. Even if conver- 
gence with respect to basis size is possible, strong cor- 
relation effects are generally considered to be a 'small' 
basis set problem, and thus a similar high-level treat- 
ment with increasingly large basis sets is an unneces- 
sary burden. Traditionally, this issue has been dealt 
with by partitioning the orbital space into a small, ac- 
tive basis in which the wavefunction is constructed from 
a multiconfigurational FCI-type expansion (12H14j . before 
the dynamical, short-ranged correlation is captured via 
excitations from this wavefunction into a larger, exter- 
nal basis. This is the rationale behind methods such 
as complete active space second-order perturbation the- 
ory (CASPT2)[15l [16], multireference configuration in- 
teraction (MRCI) JTTJ [TH] , canonical transformation the- 
ory (CT)PH HJ] and others[Hl US- However, these 
methods can be expensive, and problems can result if 
the strong correlation is not adequately captured by the 
active basis. 
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In this paper, we take a different approach, and at- 
tempt to overcome the slow basis set expansion by in- 
cluding gcminal functions in the expansion, which per- 
turbatively couple to the FCIQMC wavefunction. Hyller- 
aas's pioneering work in 1929 demonstrated that a highly 
compact form for the short-ranged wavefunction behav- 
ior can be found by introducing an explicit dependence 
on the interelectronic coordinate, r\-j 23 2o . This natu- 
rally describes the dynamical correlation problem, which 
covers both the exact form of the 'cusp' at ri 2 = 0, 
and the longer ranged coulomb hole around each elec- 
tron pair coalescence point. These cusp conditions, de- 
rived initially by Kato [2l)l - l2"5] . result from the singulari- 
ties in the coulombic electron-repulsion potential, which 
are balanced by derivative discontinuities in the wave- 
function. Neither these cusps, nor the short to medium- 
range coulomb holes are well approximated by an expan- 
sion of the wavefunction in terms of Slater determinants 
formed from Gaussian-type orbitals, that are centered on 
the nuclei of the system. 

The umbrella of 'explicitly correlated' methods covers 
ways to introduce this explicit dependence on r 12 into 
the wavefunction ansatze, including the use of exponen- 
tially correlated Gaussians[5U] and the transcorrelated 
method[30, 31 . These methods have fewer variational 
parameters to optimize, however, the presence of many- 
electron (more than two) integrals, and a more compli- 
cated optimization problem has restricted these methods 
to small systems, and their use is not yet routine. An 
exception to this is the variational Monte Carlo method, 
where the need for integral evaluation is avoided alto- 
gether, by direct stochastic evaluation of the local energy 
for the optimization of a non-linear Jastrow form for the 
wavefunction . 

The R12/F12 methods [33H35] augment the set of 
Slater determinants with a small number of determinants 
that include two-electron geminal functions[36-38j. The 
resulting many-electron integrals are avoided, or approx- 
imated via resolution of the identity (RI) (351 EUJ I3"9"H4T"] . 
The result is a method, which for the same level of the- 
ory approaches the basis set limit much more rapidly [32], 
thus greatly reducing the computational effort required 
for a given target accuracy. This approach has been 
in active development for many years, and is now at a 
highly advanced stage for single reference methods, where 
it is routinely used in a black-box fashion, and present 
in many quantum chemistry packages. However, apart 
from some early work [43, 44J , multi-reference F12 meth- 
ods have only appeared relatively recently [43H53] , even 
though the F12 approach offers great benefits in these 
cases, where the scaling with respect to basis size is of- 
ten at its most prohibitive. 

In this paper, we extend the stochastic FCIQMC 
method to include a perturbative coupling from the FCI 
determinant amplitudes, to a set of Gaussian-type gem- 
inal functions which explicitly correlate all pairs of or- 
bitals in the FCIQMC basis (OBS). This is in contrast 
to single-reference methods, where only the occupied or- 



bitals of the reference determinant are correlated. We 
follow the a posteriori [2]ri2 approach of Torheyden and 
Valeev[3B], where the geminal basis is used to compute 
a basis set incompleteness correction for the dynamic 
correlation energy using second-order perturbation the- 
ory. The geminal contribution is not optimized, but cho- 
sen to satisfy the derivative discontinuity at the elec- 
tronic cusps, and accurately represent the shape of the 
coulomb hole. Furthermore, we neglect the effect of re- 
laxation of the FCIQMC amplitudes and directly use 
the one- and two-particle reduced density matrices com- 
puted from the FCIQMC procedure. Since the gemi- 
nals are evaluated at a different level of theory to those 
FCIQMC amplitudes in the OBS, strict variationality 
with respect to the FCI complete basis set (CBS) is lost. 
Nevertheless, we will demonstrate that the perturbative 
level of theory for geminal amplitudes is sufficiently accu- 
rate and that convergence to the FCI-CBS limit is much 
faster with our hybrid approach, where FCIQMC cap- 
tures the strong correlation effects from the OBS, while 
the cheaper, polynomially-scaling, but still multirefer- 
ence F12 corrections are used for the remaining dynamic 
correlation. This 'diagonalize-then-perturb' approach of- 
fers a balanced description of the electronic correlation, 
while only requiring the small overhead from accumulat- 
ing the density matrices on-the-fly. 

The geminal basis corrects for the incompleteness in 
the description of pair correlations, but there remains 
an incompleteness due to the finite basis for the one- 
electron description. To remedy this, we modify the 
[2]s method of Kong and Valeev[54] designed to cor- 
rect for basis set incompleteness in CASSCF wavefunc- 
tions, to provide a similar multireference a posteriori ap- 
proach to one-particle incompleteness. For single ref- 
erence methods, it has been demonstrated numerically 
that the one- and two-particle incompleteness errors are 
largely decoupled [4"7l 155]. Indeed, the proposed correc- 
tions are independent and can be simply applied addi- 
tively. When both one- and two-electron corrections have 
been applied to the FCIQMC energy, we will denote the 
result FCIQMC-F12. 

However, obtaining these corrections is not as straight- 
forward as for many other deterministic methods. The 
strength of the FCIQMC approach lies in the fact that 
only a small fraction of the space is occupied at any 
one iteration, and therefore accurate energies and wave- 
functions only emerge after appropriate time-averaging 
of the walker dynamic. However, explicit averaging of 
the FCIQMC wavefunction to obtain appropriate den- 
sity matrices would negate many of the advantages of 
the method, requiring substantially increased storage and 
computational costs. Therefore, accumulation of accu- 
rate time-averaged one- and two-particle reduced density 
matrices of the FCIQMC wavefunction required for the 
corrections must be performed 'on-the-fly', in a manner 
that neither becomes a substantial computational burden 
nor reduces the parallelism of the algorithm. In addition, 
the additional storage requirements of such an algorithm 
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should ideally remain modest, and of course, must not 
scale with the size of the FCI space. 

In this paper, we will demonstrate the potential of the 
FCIQMC-F12 method by first considering the the sim- 
ple case of the dissociation of the hydrogen molecule, to 
demonstrate some salient features of the approach, and 
the necessity for multireference F12 corrections in the 
presence of strong correlation effects. We will then dis- 
cuss the quality of the on-the-fly stochastically sampled 
reduced density matrices, and their convergence both 
with elapsed imaginary time, and number of walkers in 
the space. Finally, a detailed study is performed on the 
carbon dimer, a strongly correlated molecule, which is 
chosen since much work with FCIQMC has already been 
performed for this system (see Ref. [5)) ■ Accurate results 
for the carbon dimer are presented, and the outlook for 
the approach considered. 



II. METHODOLOGY 
A. FCIQMC recap 

A brief overview of the FCIQMC method is given here, 
with more details provided in Ref. [THS1 and [51 Assuming 
a basis of M orthogonal one-particle orbitals ({p, q, . . .}), 
an explicitly antisymmetric basis of ./V-electron Slater de- 
terminants can be formed, {-Di, Dj, . . . }, which exactly 
span the FCI-space. Exact diagonalization of the Hamil- 
tonian in this basis provides the FCI energy (defining 
the basis-set correlation energy), and wavefunction com- 
ponents on each of the determinantal basis functions 
({Ci, Cj, . . . }), though its drawback is that the dimen- 
sion of this matrix scales factorially with both M and 
N. In FCIQMC, the description of the determinantal 
amplitudes is coarse-grained through a discrete 'walker' 
representation of the wavefunction. This allows for a 
compression of the instantaneous representation of the 
wavefunction, and corresponding reductions in computa- 
tional effort and storage. A set of stochastically realised 
rules are then iteratively applied to each walker, and the 
energy averaged over these iterations, until the desired 
convergence is reached. 

The master equations governing the stochastic walker 
dynamic can be derived by starting with the imaginary- 
time Schrddinger equation, 



By performing an integration to large imaginary time, r, 
the ground-state wavefunction is projected out from the 
set of all stationary solutions, in keeping with the ap- 
proach of all projector-based methods. A set of coupled 
differential equations is found, 

AC- 

-^ = (H n -E -S)C i + J2H i3 C h (2) 



where Ci are the determinant coefficients, and S is intro- 
duced as an energy-offset 'shift' parameter, which acts 
as a population control parameter, and provides an esti- 
mate of the correlation energy when it is allowed to vary 
to keep the walker population constant. 

The coefficients in Eq. Q are discretised, and repre- 
sented as the signed sum of walkers on the determinant. 
The second term in the equation is simulated each iter- 
ation as a stochastically realised spawning criterion be- 
tween connected determinants, while the first represents 
a diagonal death step. Annihilation events occur at the 
end of each iteration between oppositely signed walkers 
residing on the same determinant. However, brute force 
application of this procedure will only converge onto the 
ground state assuming that a system-specific number of 
walkers is exceeded, since annihilation events need to be 
numerous enough in order to overcome the 'sign problem' 
present in the sampling of the space. If this is achieved, 
then at convergence, the signed number of walkers on any 
determinant will be proportional to their FCI coefficient. 

The number of walkers required to achieve convergence 
to FCI accuracy within small error bars is dramatically 
accelerated by invoking a systematically improvable ap- 
proximation, termed initiator or i-FCIQMC, which will 
be used exclusively in this work. In this, the growth of 
the occupied determinant space is controlled, such that 
previously unoccupied determinants can only become oc- 
cupied if they are spawned onto from a determinant with 
a population exceeding a preset parameter n a dd- The 
rationale behind this is that determinants with weights 
larger than n a( jd are likely to have their sign established 
correctly with respect to the rest of the instantaneous 
wavefunction, and therefore the uncontrolled propaga- 
tion of noise from competing signed solutions of the prob- 
lem should be limited. Since the walker population on 
any determinant is constantly changing, so does the space 
which can successfully spawn onto the unoccupied deter- 
minants, and as such the approximation cannot be writ- 
ten as a simple change to a static Hamiltonian matrix, 
but rather one which utilises efficient error cancellation 
within a time-averaged dynamic. The dynamic rigorously 
converges to the original scheme as the walker number is 
increased, or n a dd is reduced, which has been shown in 
several systems to provide an exponential saving over the 
original formulation. 

The energy can be extracted from the dynamic via a 
non- variational projection onto a reference wavefunction, 
D , 

E pmi =E + J2(Di\H\D )^, (3) 

which in this work is chosen to be the instantaneously 
largest weighted single determinant in the space (which 
was the Hartree-Fock determinant for geometries close 
to equilibrium). Alternatively, once the walker popula- 
tion has equilibrated, the value of the shift parameter, 
S, which renders the propagated FCIQMC wavefuncion 
L 1 norm-conserving can also be used as an estimate for 
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the correlation energy, which when averaged over imag- 
inary time is denoted Eg. Due to the independence of 
this estimate from a reference state, it is generally used 
for the highly multireference stretched geometries. As we 
will see later, it is also possible to calculate another en- 
ergy estimate from the trace of the reduced Hamiltonian 
with the sampled two-electron density matrix, but this 
in not generally the estimate which is used, due to its 
convergence properties which are discussed later. 



B. [2] R i2 and [2]s 

The [2]ri2 and [2]g methods were developed by Valeev 
and co-workers as perturbative basis set completeness 
corrections that can in principle be applied to any elec- 
tronic state for which the one- and two-particle reduced 
density matrices are available [IrJl Wf[ 154"] . Here we apply 
these corrections to the situation where the FCIQMC 
correlation energy has been determined in a computa- 
tional basis of spin-orbitals {p, q,r, s}. Consider the 
Mukhcrjcc-Kutzclnigg normal-ordered Hamiltonian 



H = E° + F + G. 



(4) 



where E° = (Q\H\0) is the energy of the reference zeroth- 
order FCIQMC wave function |0). The effective Fock 
operator F, in a formally complete one-particle basis de- 
noted by orbital indices {k, A}, is given by 



(5) 
(6) 



where 7^ is the one-particle reduced density matrix which 
spans the one-particle basis of |0), g£j = {np\\q) are an- 
tisymmetric electron repulsion integrals, are the el- 
ementary normal-ordered operators in the Mukherjee- 
Kutzelnigg sense, and Einstein summation convention is 
assumed. The zeroth-order Hamiltonian is chosen to be 

H° = E a + PFP+(l-P)F(l-P) + PGP (7) 

where P projects onto the computational basis {p, q}, 
used for the FCIQMC calculation. 

The [2]s energy correction is computed by second-order 
perturbation theory in a basis of states generated by 
singly exciting from the reference |0) into complemen- 
tary auxiliary (CA) orbitals {a',b'}, orthogonal to the 
computational basis {p, q}. 



Xpq = Ipq ~ %lq +7p7g i s the two-electron cumulant and 
7jj and are the one- and two-particle density matrices. 
We also investigated the performance of the zeroth order 
Dyall Hamiltonian [53], but found that the convergence 
was less numerically stable. 

The [2]ri2 energy correction is computed by second- 
order perturbation theory in an internally contracted ba- 
sis of Gaussian geminal-containing Slater determinants 
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The operator O ensures the strong orthogonality (one- 
electron orthogonality) condition of the geminal basis to 
the reference wavefunction, and the matrix elements 



--S xy (K\\Qi2f(ri 2 )\xy) 



(14) 



are the representation of the projected Gaussian geminals 
in second quantization. The projector Q12 = 1 — P1P2 
is used in order to simplify the working equations, and 
the rational generator S xy = (| + ^p xy ), where p xy in- 
terchanges the spatial components of the spin-orbitals x 
and y[56, 57 , is used to impose the appropriate cusp con- 
ditions. The Gaussian geminals f{r\2)\xy) are defined 
by the orbital space {i>,u>, x,y}, which is chosen to be 
the full computational basis {p, q, r, s} and the correla- 
tion factor f(r±2) is a linear combination of six Gaussians 
that closely fit an exponential — 7 -1 exp(— 77-12) [58], ac- 
cording to Ref. [55]. The [2] R12 energy correction is given 
by the Hylleraas functional 
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where the many-electron integrals are approximated us- 
ing standard R12/F12 manipulations and through RI in- 
sertions of the basis {p' , q' ,r'}, defined as the union of the 
computational basis {p, q, r, s} and the CA basis {a', b'}. 
This results in the working equations 
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where the amplitudes t q a , are calculated from the solution 
of 

^' q tl,=-^fy (10) 

Afi =5 a b >Pq{Ks - ^slD + fb'Yq ■ (H) 
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where P(pq)O pq = O pq - O qp , \x*) = {ft + k* )\p>) and 
/ and k are the Fock and exchange matrix elements (see 
Eq. [6| . The geminal integrals are defined through 

VV =S,y{p'q'\f(ri2)\xy) (20) 

u pY =S X y(p'q'\riif(ri2)\xy) (21) 

x x Z =S vw S xy (vw\f(r 12 ) 2 \xy) (22) 

tZI =S vw S xy {vw\^ x !{r 12 )f\xy) (23) 

The geminal amplitudes, ij| are not optimized, but se- 
lected to satisfy the coalescence conditions, which due 
to the presence of the rational generator, reduce to 
^xy = Sxpfiyq ~ ^xqS yp - The above equations are essen- 
tially those of Kong, Torheyden and Valeev, adapted for 
the present purpose. The spin-adapted formalism pre- 
sented in Ref. 07] recasts the equations in terms of spin- 
free orbitals, although also introduces an approximation 
to remove the appearance of 4-RDMs in the formal theory 
when calculating the B intermediate. This modification 
is not expected to change results greatly [60 . Our imple- 
mentation reduces exactly to the single reference MP2- 
F12/2C* F12 correction |6TtJS3"] if the diagonal Hartree- 
Fock density matrices are used. All integrals were com- 
puted using the Dalton program|64 and were combined 
with stochastically generated one- and two-particle den- 
sity matrices from FCIQMC in a stand-alone program. 



III. DISSOCIATION OF HYDROGEN 

In this section, we provide a simple example to illus- 
trate the importance of an F12 correction that takes into 
account the multireference nature of the underlying wave- 
function. The hydrogen molecule provides a simple two- 
electron system which nonetheless encapsulates many of 
the problems in ab initio quantum chemistry. At equilib- 
rium and compressed bond lengths, dynamic correlation 
dominates, and both one- and two-electron basis set in- 
completeness is generally significant. However, as the 
bond is stretched, the system becomes intrinsically mul- 
tireference, as unphysical ionic terms in the Hartree-Fock 
determinant need to be canceled by a similarly weighted 
excited determinant. At the dissociation limit, the sys- 
tem reduces to two independent one-electron hydrogen 
atoms, with no local correlation between the two elec- 
trons. By size extensivity, the energy of this system must 
be exactly -lEh- However, even if the static correlation 
is exactly captured, this will still only be achieved in the 
complete basis set (CBS) limit since there remains an in- 
completeness in the one-electron description, which will 
simply be twice the error in the Hartree-Fock energy of 
a hydrogen atom in the same basis. This therefore pro- 
vides a good system to consider both dynamic and static 
correlation effects, as well as separating the basis set in- 
completeness in both the one- and two-electron part of 
the wavefunction. 



The binding curve for hydrogen in a cc-pVDZ basis 
is shown in Fig. [T] Second-order Moller-Plesset the- 
ory (MP2) gives a qualitatively incorrect description for 
bond lengths greater than ^2.5ao due to the increasingly 
static correlation effects, whereas the Coupled-Cluster 
singles and doubles (CCSD) energies agrees exactly with 
FCIQMC (and FCI) values at all bond lengths, as ex- 
pected for a two-electron system. However, the CCSD- 
F12 methods all fail to dissociate H2 correctly, with errors 
in the dissociation energy up to 7mEh- The FCIQMC- 
F12 method achieves the exact dissociation energy to 
within 0[/iEh], which highlights the necessity for an 
F12 correction that explicitly correlates a multireference 
wavefunction. 



Further insight can be obtained from the individual 
contributions to the basis set incompleteness in Fig. [2] 
The reason for the failure of the CCSD-F12 methods is 
that the geminal amplitudes are selected such that they 
satisfy the coalescence conditions of the first-order wave 
function, that is, they explicitly correlate the Hartree- 
Fock orbital pairs only. At infinite separation, the hy- 
drogen molecule reverts to a sum of two atomic hydrogen 
atoms, and since the two electrons are now spatially sep- 
arated the two-electron F12 contribution should be rig- 
orously zero. This spuriously does not occur for CCSD- 
F12, because the Hartree-Fock determinant still contains 
terms with both electrons localized to the same atomic 
fragment. CCSD-F12 variants where the geminal ampli- 
tudes are optimised rather than fixed do dissociate H2 
correctly, since the geminal amplitudes optimise to zero 
at infinite separation, but these methods suffer from nu- 
merical instabilities and geminal basis set superposition 
errors [65] and are not recommended. In the FCIQMC- 
[2]ri2 approach, all orbital pairs are explicitly correlated 
and the two electron correction naturally tends to exactly 
zero, as required. This example does not intend to high- 
light a deficiency of CCSD-F12 theory, since for systems 
of more than two electrons the parent method would also 
fail for bond-breaking processes, but rather highlight the 
importance of an explicitly correlated geminal basis com- 
posed of more than the occupied Hartree-Fock orbitals 
in the presence of strong static correlations. 



Similar considerations apply to the one-electron basis 
set incompleteness correction, although this should not 
now go to zero at the dissociation limit. For increasing 
bond lengths the FCI basis set incompleteness tends to a 
sum of the Hartree-Fock basis set errors for the individ- 
ual hydrogen atoms. Again, the correction for this based 
on a single Hartree-Fock reference is shown to be in- 
adequate for long bond lengths where strong correlation 
effects are large. However, the FCIQMC- [2] g approach 
almost exactly corrects for the one-electron error over 
the whole binding curve, resulting in a total energy at 
dissociation of -lEh- 
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FIG. 1. Binding curves of H2 in a cc-pVDZ basis. The sys- 
tem takes on a more multireference character as the bond is 
stretched. Dissociation to two hydrogen atoms should yield 
an exact energy of -lEh. However, despite agreement between 
CCSD and FCIQMC, the single-reference F12 corrections all 
fail to successfully obtain the correct dissociation limit, with 
only the fully multireference FCIQMC-F12 energy getting this 
exactly (error C[/iEh]). The individual corrections can be seen 
in Fig. [2] All calculations used a cc-pV6Z-RI CABS basis, and 
a 7 of l.Oajj" . CCSD calculations were run with M0LPR0. 



IV. REDUCED DENSITY MATRICES FROM 
FCIQMC 

In order to construct the perturbative coupling to the 
geminal basis, it is necessary to obtain the one- and two- 
electron reduced density matrices (RDM). Much of the 
benefit of the FCIQMC approach would be lost if the 
time-averaged wavefunction over the entire FCI space 
was required in order to construct these, and even an 
explicit consideration of all occupied determinants con- 
nected via the desired RDM each iteration would become 
impractical when extending the approach to larger sys- 
tems. Despite this, an accurate extraction of the RDM 
information from the time-averaged FCIQMC dynamics 
is essential for this method, while many other important 
molecular properties, such as nuclear gradients, dipole 
moments and polarizabilities can be obtained from the 
density matrices via a trace of the observable operator 
with the appropriate density matrix. 

The reduced density matrices over all orbitals in the 
OBS can be defined in second quantization as 

7 « =<*|fi&|*) (24) 
% s g =<*|otata.a r |*) J (25) 

where the distinction between the one- and two-body 
matrices is evident from the rank of the tensor. Since 
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FIG. 2. Individual one- (dashed) and two- (solid) particle cor- 
rections to different methods across the dissociation of H2, as 
given in Fig. [I] Although there is no two-particle correlation 
at dissociation, only the [2]rti2 F12 correction to the FCIQMC 
wavefunction correctly goes to zero in this limit. The multiref- 
erence [2]s correction for single-particle incompleteness to the 
FCIQMC wavefunction then provides the remaining energy to 
give the exact FCIQMC-F12 dissociation limit. The CABS 
singles approach (CABSs) overestimates the energy, while the 
single-reference F12 contributions to the different methods all 
converge to non-zero contributions. 



the Hamiltonian operator that is sampled during the 
FCIQMC dynamic contains the same excitation rank as 
these matrices, it is possible to devise a way to sample 
the one- and two-body density matrices on-the-fly and 
stochastically, during the FCIQMC dynamic. The contri- 
butions must the be appropriately unbiased for the mag- 
nitude of the Hamiltonian matrix element between indi- 
vidual connected determinant pairs. The precise manner 
in which this is done will be described elsewhere. The ac- 
cumulation of the reduced density matrices begins once 
the simulation has reached equilibrium and continues for 
a number of iterations until sufficient statistical accuracy 
is obtained, before the matrices are normalized accord- 
ing to their required trace relation. As will be shown, in 
many cases the density matrices converge rapidly with 
iterations, resulting in a relatively small computational 
overhead for their accumulation. 

Ignoring potential errors in the description of the wave- 
function due to the initiator approximation, two errors 
can arise due to this on-the-fly sampling of the density 
matrices. Since they are accumulated via a sampling of 
the Hamiltonian, it requires that a non-zero Hamiltonian 
matrix element exists between all pairs of occupied de- 
terminants which are connected via one- and two- body 
excitation operators. If this is not the case, then no walk- 
ers can spawn between these determinants, and therefore 
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no contribution can be made to the density matrices from 
this pair, introducing a bias into the matrices (but not 
the variational energy from the matrices). Determinants 
between different symmetry blocks of the Hamiltonian 
are not connected, however since there will be no weight 
on determinants outside the currently sampled symmetry 
block, this does not introduce an error. 

The only instance of this criteria being rigorously un- 
fulfilled is in the case of Brillouin's theorem when the one 
particle basis consists of canonical Hartree-Fock orbitals, 
and Hamiltonian matrix elements between the Hartree- 
Fock determinant and all of its single excitations are rig- 
orously zero, despite potentially significant weights on 
these determinants. To account for this, and improve 
the description of the RDMs, all connections between the 
Hartree-Fock determinant and its single and double ex- 
citations, are considered explicitly during each iteration. 
This removes any error due to Brillouin's theorem, and 
improves the quality of the RDMs with negligible compu- 
tational overhead. The small possibility that two signifi- 
cantly weighted symmetry-similar determinants are con- 
nected via a single or double excitation operator, but 
with a statistically zero Hamiltonian matrix element be- 
tween them, remains a potential source of error. 

The other possible error from this sampling of can arise 
if instantaneous wavefunction values are used, rather 
than walker populations averaged in time, i.e. the as- 
sumption that (CiCj) = (Ci)(Cj), where the averages are 
over the iterations of the FCIQMC run during accumu- 
lation of the density matrices. This assumes no serial 
correlation between the errors in the determinant popu- 
lations on Di and D- } , and therefore enough walkers such 
that any correlated fluctuations in the amplitudes do not 
introduce a bias. Since the RDMs are functions of prod- 
ucts of these determinant coefficients, this error will not 
decay to zero simply with increasing sampling time. 

However, as opposed to the first error, this bias is sys- 
tematically improvable, and will rigorously vanish in the 
limit of a large number of walkers. To ameliorate this er- 
ror when not at this limit, an averaged walker population 
over the duration of non-zero occupancy is maintained for 
each occupied determinant. Since this is only calculated 
over the instantaneously occupied subspace, it again in- 
volves only small additional effort, and weights were not 
maintained across the whole space. However, an error is 
still present since this averaged walker population infor- 
mation is lost once a determinant becomes unoccupied. 
An important question is the rate at which any error de- 
cays as the total walker population increases, especially 
in comparison to the decay of the initiator error and the 
size of the random errors. This rate of convergence is 
likely to depend on the specific property that the density 
matrices are being used to calculate, and in this paper, 
we will consider the convergence of the error in the calcu- 
lation of the F12 corrections and energy estimators with 
increased walker number, as well as the convergence with 
increased iterations. 



A. Convergence of Density Matrix properties 

To investigate the convergence of the density matri- 
ces with walker number and imaginary time, we consider 
both the equilibrium (f.3 A) and stretched (7.0 A) ge- 
ometries of the frozen-core carbon dimer molecule in a 
restricted Hartree-Fock basis, with a geminal exponent 
fixed to equal 1.0a^~ . These two bond lengths highlight 
vastly different correlation effects and wavef unctions, to 
provide contrasting tests for the accuracy of the sam- 
pled density matrices. The equilibrium geometry is pre- 
dominantly single-reference, dominated by the weight at 
the Hartree-Fock determinant, while the stretched case is 
highly multiconfigurational, with no single determinant 
contributing overwhelmingly to the electronic structure. 
Fig. [3] shows the convergence of various estimators used 
in this work with increasing walkers in a cc-pVDZ ba- 
sis set, while Fig. [4] extends this to the larger cc-pVTZ 
basis. It should be noted that these calculations are in- 
dependent to the ones calculated in RefJSl since angular 
momentum symmetry was not explicitly included in these 
calculations. This resulted in a larger space of 2.25 x 10 10 
symmetry- allowed determinants in the cc-pVTZ basis, al- 
though through the use of time-reversal symmetry, the 
number of distinct ./V-electron functions is about half this 
number. 

The -Erdm energy estimator is calculated from the 
trace of the two-electron RDM with the reduced two- 
electron Hamiltonian, 



E 



RDM — 



E 



pq r»pqrs 



= Tr[ 7 fc], 



where 



k 



pq 
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(26) 



(27) 



This should not be confused with the projected or shift 
energy estimator from the FCIQMC calculation, which is 
used to calculate the energy in FCIQMC, and is defined 
in Eq. Generally, Eq. (26) would define a strictly 



variational energy for the wavefunction, and be equiva- 
lent to the pure estimate | W) . However, since the 
RDM is not explicitly calculated, but rather stochas- 
tically sampled, it may not strictly correspond to the 
FCIQMC wavefunction at all times. This therefore has 
the potential to break strict N-representability, and thus 
return a non- variational energy, however this is not found 
in this study and always represents a variational estimate. 
The effect of the correlated sampling error, compounded 
by the variational constraint, means that the -Erdm en- 
ergy estimator is found to converge to the FCI energy 
of the system far slower with respect to walker number 
compared to the projected or shift energy, as shown in 
Fig. [3] For this reason, it is not used as an estimate of 
the FCI basis set energy of the system. 

However, this is not found to be the case for the 
perturbative basis set incompleteness corrections, with 




FIG. 3. Convergence of the energy estimates and F12 cor- 
rections with increasing walker number, for C2 in a cc-pVDZ 
basis set, at equilibrium (1.3 A) and stretched (7.0 A) ge- 
ometries. The dotted lines represent the stochastic errors on 
the most accurate -E pr oj or Es values, relative the final [2]ri2 
and [2]s results. The density matrices were accumulated over 
100,000 iterations. 



FIG. 4. Convergence of the energy estimates and F12 cor- 
rections with increasing walker number, for C2 in a cc-pVTZ 
basis, at equilibrium (1.2425 A) and stretched (7.0 A) ge- 
ometries. The dotted lines represent the stochastic errors on 
the most accurate -E pro j or Es values, relative the final [2]mj 
and [2]s results. The density matrices were accumulated over 
100,000 iterations. 



[2]ri2 and [2]s based on the sampled density matrices 
also shown. These indicate convergence to approximately 
within the stochastic errors of the final FCIQMC energy 
estimate when the walker number exceeds 3.5 x 10 6 , for 
both the equilibrium and stretched geometries. These 
errors were 0.17mEh in the projected energy (_E pro j) for 
the equilibrium case and 0.43mEh in the shift estimate 
(Es) for the stretched case. Intriguingly, the convergence 
is faster for the multiconfigurational stretched case, al- 
though less monotonic. In the stretched case, the two- 
particle F12 correction is only ~66% of the correction 
at equilibrium, reflecting the increased separation of the 
electrons localized to each atom and thus reduced ener- 
getic importance of the cusps between them. However, 
the magnitude of the incompleteness in the one-electron 
space is virtually unchanged, and is approximately half 
the size of the [2]ri2 correction. 

The convergence in the cc-pVTZ basis (Fig. |4| shows 
a similar trend, with large errors remaining in the en- 
ergy from the density matrices (£rdm), while the F12 
corrections converge at a much faster rate. Convergence 
to approximately within the errorbars of the FCIQMC 
energy (0.19mEh for the equilibrium case and 0.25mEh 
for stretched) is achieved within a sampling of 15 million 
walkers. The magnitude of the [2]bj.2 basis set correc- 



tions are only approximately a third of the size in the 
cc-pVTZ basis compared to the cc-pVDZ basis, while the 
[2]s correction is reduced even further to approximately 
a quarter of its previous size. This reflects the increased 
correlation captured by FCIQMC in the larger orbital 
basis, while indicating that this improves the accuracy of 
the one-body description more than the two-body case. 

The observation that the F12 corrections converge 
much faster than the energy estimate from the density 
matrices can be rationalized by considering the limiting 
case of a diagonal Hartree-Fock density matrix. This 
would recover the single-reference F12 and CABS singles 
corrections of MP2, which are already quite accurate at 
equilibrium, whereas the correlation energy computed us- 
ing -Erdm is by definition zero. It is key to note that in 
both cases, the convergence of the F12 corrections with 
walker number to within the stochastic errors of the i- 
FCIQMC energy is not significantly slower than the num- 
ber of walkers required to acceptably remove the effects 
of the initiator approximation in the FCIQMC energy 
estimate, and therefore the additional expense of calcu- 
lating the F12 corrections is not a large fraction of the 
overall computational cost. 

Another key consideration is the convergence of the 
F12 corrections with respect to sampled imaginary time. 
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FIG. 5. Convergence of the energy estimates with imag- 
inary time spent sampling the density matrices for C2 in a 
cc-pVDZ basis for both equilibrium (1.3 A) and stretched 
(7.0 A) geometries. 3.5 million walkers sampled the space. 
Dotted lines show typical error bars in FCIQMC projected 
energy and shift, taken to be the same as in Fig. [3] 



Fig. [5] shows this for the cc-pVDZ basis, again at both 
equilibrium and stretched geometries. In contrast to the 
convergence with walker number, significant differences 
exist between the two geometries, with the more mul- 
ticonfigurational wavefunction at stretched geometry re- 
quiring many more iterations to reach energy convergence 
to within typical stochastic errorbars of the z-FCIQMC 
energy estimator. Despite this, the number of iterations 
required to sample the density matrices to obtain this ac- 
curacy in the F12 estimates is again not generally more 
than than required to obtain typical errorbars in the cal- 
culation of the FCIQMC energy estimate (0[lCT 4 ]E h ). 
Although not shown, the cc-pVTZ basis exhibits a simi- 
lar convergence. 



V. THE CARBON DIMER 

The carbon dimer has been the focus of a recent 
FCIQMC investigation, where its strong correlation ef- 
fects, state crossings of the same Abelian symmetry 
group, avoided crossings and large basis set incomplete- 
ness provided a stringent test of any method[5]. Despite 
enlargement of the basis to quadruple-zeta quality, it was 
evident that large basis set errors remained. This is now 
tackled within the framework of the explicitly correlated 



approach in order to improve upon these results. Follow- 
ing the convergence investigations shown in Figs. [3] and 
|4j the cc-pVDZ and cc-pVTZ C 2 binding curves were cal- 
culated with FCIQMC-F12 using 3.5 x 10 6 and 13 x 10 6 
walkers respectively, for each geometry. The 2-RDM was 
stochastically constructed once each simulation reached 
constant walker mode. Angular momentum symmetry 
is not explicitly imposed in our calculations. Therefore, 
to avoid complications arising from state crossings and 
metastable convergence to excited states 0, we consid- 
ered bond lengths less than 1.6 A where the ground 
X X E+ state is dominant, as well as geometries near disso- 
ciation with bond lengths more than five times the equi- 
librium. 

In Fig. [6] we plot the deviation between FCIQMC and 
FCIQMC-F12 energies with various basis sets and our 
reference FCIQMC-F12/cc-pVTZ values, as a function 
of bond length. In Table [I] we present non-parallclity 
errors (NPE), defined as the maximum absolute devia- 
tions over the range of geometries considered. The F12 
corrections reduce the NPE of the FCIQMC/cc-pVDZ 
method by more than a factor of seven, which is compa- 
rable with the NPE of the FCIQMC/cc-pVQZ energies. 
However, the cc-pVQZ basis requires over 30xl0 6 walk- 
ers to converge the FCIQMC energy [5], even when when 
angular momentum symmetry is utilised. In contrast, 
the cc-pVDZ basis only requires a tenth of this num- 
ber, even without the additional symmetries, highlight- 
ing the saving achieved with the hybrid F12 approach 
within FCIQMC. The effect of including the whole range 
of bond lengths is unlikely to affect the NPEs, since the 
dominant basis-set incompleteness error is found at equi- 
librium and compressed geometries, where the electrons 
are in close proximity, and dynamical correlation is at its 
largest. 

It is this fact which makes the calculation of the dis- 
sociation energy so sensitive to basis set incompleteness. 
The calculation was performed using the energy of the 
system at a dissociated geometry of 6.21265 A rather 
than twice the atomic energy. This provided a highly 
multiconfigurational state with which to test the ap- 
plication of the corrections, although the uncorrected 
FCIQMC energies were checked for any size-consistency 
error in this value relative to the atomic reference sys- 
tems. The results can be seen in Table[jJ compared to the 
experimental dissociation value which is approximately 
corrected for zero-point energy and core-valence correla- 
tion effects. A similar trend to the NPEs is observed, 
with the F12 corrected FCIQMC value in a cc-pVDZ 
basis achieving accuracy just shy of the uncorrected cc- 
pVQZ result. 

Despite the large improvement of the cc-pVTZ ba- 
sis FCIQMC-F12 result, the dissociation energy is still 
not within chemical accuracy of the experimental value. 
When taking into account random errors of the FCIQMC 
method, as well as uncertainty in experimental results, 
these values differ by 1.6(6)kcalmol^ 1 . From a calcu- 
lation of the all-electron cc-pVTZ FCIQMC-F12 energy 
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FIG. 6. The errors relative to the FCIQMC-F12 results in 
the cc-pVTZ basis, for the C2 binding curves calculated using 
the cc-pVDZ, cc-pVTZ and cc-pVQZ basis sets, as well as the 
cc-pVDZ energies with F12 corrections and extrapolated cc- 
pV(TQ)Z energies|67|. Only the 0.9 - 1.6 A region is shown, 
as all these calculations converge onto the X state. With 
F12 corrections, the NPE across the entire cc-pVDZ binding 
curve is reduced from 86.8(4) to 12.3mE h . The NPEs are 
compared in Table]!] The vertical line denotes the equilibrium 
geometry. 



Basis 


NPE 
mEh 


D c 

1 kcal mol -1 


cc-pVDZ 


86.8(4) 


130.0(1) 


cc-pVTZ 


21.0(6) 


139.9(3) 


cc-pVQZ 


4.9(6) 


143.3(2) 


cc-pVDZ+F12 12.3 


142.3 


cc-pVTZ+F12 




145.3 


Expt. 




146.9(5) 



TABLE I. The non-parallelity error and dissociation energies 
obtained from each of the C2 binding curves. The uncorrected 
cc-pVQZ results were taken from Ref. The NPE refers to 
the non-parallelity error across the X X Y^ state up to geome- 
tries of 1.6A. The experimental dissociation energy was taken 
from Ref. 1681 and includes a correction to account for core- 
valence correlation. Random errors are not included in the 
F12 corrected values, but can be expected to have errors of 
at least the errors in the uncorrected energies. 



at the experimental equilibrium and fully dissociated ge- 
ometries, the error is reduced to 0.92(53)kcalmol _1 , with 
the approximate core-valence correction neglected. Al- 
though this improves the final dissociation energy to 
within 'chemical accuracy', it is yet to be seen whether 
remaining basis set error (such as a need for additional 
diffuse functions within the F12 framework), other ne- 
glected effects, or experimental ambiguity can account 
for the remaining discrepancy between the two values. 



However, it is clear from this, and other studies 68], that 
achieving chemical accuracy for this system is exceed- 
ingly difficult, where strong correlation, as well as signif- 
icant basis set incompleteness can cause severe errors. 



VI. SUMMARY AND CONCLUSIONS 

In this paper we use an a posteriori [2]ri2 an d [2]s 
approach [351 H3, [S3] m order to account for basis set in- 
completeness within FCIQMC. This allows for pertur- 
bative excitations from the sampled FCIQMC wavcfunc- 
tion into a space of strongly orthogonal geminal func- 
tions, via the two-body reduced density matrix which is 
accumulated on-the-fly. The basis set error in the one- 
particle space is accounted for via an adaptation of the 
[2]s method designed for incompleteness in CASSCF. It is 
shown that the need for an F12 correction which can cor- 
relate a multircfcrcnce zeroth order wavefunction, is cru- 
cial in strongly interacting systems. The convergence of 
the F12 properties with respect to both increasing walker 
number, and sampling time of the density matrix is inves- 
tigated, and found to converge at a similar rate to that of 
the projected energy estimate used in i-FCIQMC. This 
was the case for both the primarily dynamically corre- 
lated carbon dimer at equilibrium geometry, as well as 
the highly multiconfigurational stretched geometry. A 
relatively modest increase in computational effort was 
required to calculate these F12 corrections, and there- 
fore a large saving was achieved by using the corrections 
to reduce the basis set error with FCIQMC in converged 
calculations. 

This machinery was then used to treat the carbon 
dimer at a variety of bond lengths and basis sets, in order 
to quantify the benefit of the F12 corrections. This sys- 
tem was the subject of a previous investigation, where 
remaining basis set error, even in quadruple zeta basis 
sets, prevented convergence to within chemical accuracy. 
In keeping with other studies, the F12 corrections were 
found to provide a gain of close to two cardinal numbers 
in the basis set when considering non-parallelity errors 
and dissociation energies, and provided a balanced de- 
scription of both dynamic and strong correlation effects 
at equilibrium and stretched geometries. This enabled 
the calculation of the dissociation energy to within chem- 
ical accuracy once core electron correlation was explicitly 
taken into account. In the future, we hope this approach 
will allow us to tackle a range of strongly correlated sys- 
tems, without requiring large computational basis sets for 
the FCIQMC calculation. This approach however does 
not allow for a relaxation of the FCIQMC wavefunction 
due to the presence of the geminal functions. A future 
aim will be to compare this approach with an alternative 
a priori transcorrelation of the Hamiltonian[30, 53 , as 
well as more traditional CASPT2 and MRCI techniques. 
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